Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: split if block too big during append #16435

Merged
merged 15 commits into from
Sep 14, 2024
Merged

Conversation

zhyass
Copy link
Member

@zhyass zhyass commented Sep 10, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  1. Refactoring TransformCompact:

    • The TransformCompact was split into two components to improve the modularity and efficiency of the compaction process:
      • BlockCompactBuilder: Responsible for constructing compaction tasks.
      • TransformCompactBlock: Executes the actual compaction in a parallelized manner.
  2. Improvement in Compaction Logic:

    • The logic was adjusted to avoid writing excessively large data blocks during compaction.
    • The new structure ensures compaction is done as early as possible during the data writing phase.
  3. Block Size Control:

    • The changes aim to fine-tune block sizes during the compaction, ensuring that the resulting blocks are neither too small nor too large, which can impact the efficiency of both storage and read operations.
  4. Replace HashMap with BTreeMap in reclustering fetch_max_depth, for stable reclustering effects

  5. Compact source data blocks before reclustering, for better performance and clustering

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@zhyass zhyass marked this pull request as draft September 10, 2024 17:42
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 10, 2024
@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Sep 10, 2024

This comment was marked as off-topic.

1 similar comment

This comment was marked as off-topic.

@dosubot dosubot bot added the A-query Area: databend query label Sep 10, 2024
@zhyass zhyass added the ci-benchmark Benchmark: run all test label Sep 10, 2024

This comment was marked as outdated.

This comment was marked as outdated.

@zhyass zhyass added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 11, 2024

This comment was marked as outdated.

This comment was marked as outdated.

@zhyass zhyass marked this pull request as ready for review September 11, 2024 11:25
@zhyass zhyass added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels Sep 11, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-16435-084e520-1726056170

note: this image tag is only available for internal use,
please check the internal doc for more details.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Sep 11, 2024
@zhyass zhyass marked this pull request as draft September 11, 2024 17:07
Copy link
Contributor

Docker Image for PR

  • tag: pr-16435-7dac011-1726076159

note: this image tag is only available for internal use,
please check the internal doc for more details.

@zhyass zhyass removed the ci-benchmark Benchmark: run all test label Sep 12, 2024
@zhyass zhyass marked this pull request as ready for review September 12, 2024 15:05
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Sep 12, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 14, 2024
@dantengsky dantengsky added this pull request to the merge queue Sep 14, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Sep 14, 2024
@BohuTANG BohuTANG merged commit 9f0b15b into databendlabs:main Sep 14, 2024
84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query Area: databend query lgtm This PR has been approved by a maintainer pr-bugfix this PR patches a bug in codebase size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants